A Framework for Testing Properties of Discrete Distributions: Monotonicity, Independence, and More

نویسندگان

  • Jayadev Acharya
  • Constantinos Daskalakis
  • Gautam Kamath
چکیده

Given data sampled from an unknown discrete probability distribution p, does the underlying distribution possess some property of interest? For instance, is the distribution uniform? Monotone? Are its marginals independent? This class of problems is one of the most fundamental questions in statistics, where it is known as hypothesis testing. Classical work on this problem has focused on distributions over a domain of a fixed size, as the number of samples goes to infinity. However, in modern scenarios, distributions may be over massive domains, and we are often limited by the number of samples or computational power. As such, over the past two decades, there has been intense study with these goals in mind (see [2] for a survey). Nevertheless, even for many basic properties of distributions, the optimal sample complexity was unknown. We provide a testing framework which achieves the optimal sample complexity for the properties mentioned above, and more. Notably, all properties we study have strongly sublinear complexities, requiring only a number of samples proportional to the square root of the domain size. The framework follows a conceptually simple learn-then-test approach. Naively, such methods seemed to be intrinsically statistically inefficient, due to an information-theoretic lower bound for robust `1 identity testing [3]. We bypass this lower bound by using χ2 distance as an intermediary metric. χ2 is a non-uniform rescaling of `2, and is more “punishing” than `1. This makes learning in χ2 slightly harder, but testing in χ2 drastically easier. This work appeared at NIPS’15 as [1]. BODY Wanna optimally test if your distribution has a property? Assume it does, learn the distribution, and test the hypothesis. Use χ2 distance!

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

On discrete a-unimodal and a-monotone distributions

Unimodality is one of the building structures of distributions that like skewness, kurtosis and symmetry is visible in the shape of a function. Comparing two different distributions, can be a very difficult task. But if both the distributions are of the same types, for example both are unimodal, for comparison we may just compare the modes, dispersions and skewness. So, the concept of unimodali...

متن کامل

Classification and properties of acyclic discrete phase-type distributions based on geometric and shifted geometric distributions

Acyclic phase-type distributions form a versatile model, serving as approximations to many probability distributions in various circumstances. They exhibit special properties and characteristics that usually make their applications attractive. Compared to acyclic continuous phase-type (ACPH) distributions, acyclic discrete phase-type (ADPH) distributions and their subclasses (ADPH family) have ...

متن کامل

Optimal Testing for Properties of Distributions

Given samples from an unknown distribution p, is it possible to distinguish whether p belongs to some class of distributions C versus p being far from every distribution in C? This fundamental question has received tremendous attention in statistics, focusing primarily on asymptotic analysis, and more recently in information theory and theoretical computer science, where the emphasis has been o...

متن کامل

Testing a Point Null Hypothesis against One-Sided for Non Regular and Exponential Families: The Reconcilability Condition to P-values and Posterior Probability

In this paper, the reconcilability between the P-value and the posterior probability in testing a point null hypothesis against the one-sided hypothesis is considered. Two essential families, non regular and exponential family of distributions, are studied. It was shown in a non regular family of distributions; in some cases, it is possible to find a prior distribution function under which P-va...

متن کامل

Stochastic bounds for a single server queue with general retrial times

We propose to use a mathematical method based on stochastic comparisons of Markov chains in order to derive performance indice bounds‎. ‎The main goal of this paper is to investigate various monotonicity properties of a single server retrial queue with first-come-first-served (FCFS) orbit and general retrial times using the stochastic ordering techniques‎.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • TinyToCS

دوره 4  شماره 

صفحات  -

تاریخ انتشار 2016